All Questions
Tagged with pythonreinforcement-learning
59 questions
0votes
0answers
20views
Python - adding more timesteps makes my model "fail"
Hi! I have just made my first model in stable-baselines3 using pygame in Python. The game is about a ball reaching the highest platform out of three placed in the sky. Now - after a few days of trying ...
0votes
1answer
164views
Reward not improving for a custom environment using PPO
I've been trying to train an agent on a custom environment I implemented with gym where the goal is to resolve voltage violations in a power grid by adjusting the active power (loads) at each node. I ...
1vote
1answer
87views
Deep RL problem: Loss decreases but agent doesn't learn
I'm implementing a basic Vanilla Policy Gradient algorithm for the CartPole-v1 gymnasium environment, and I don't know what I'm doing wrong. No matter what I try, during the training loop the loss ...
1vote
0answers
22views
Optimizing Wind Park Layout Using Direct Action-to-Input Mapping
I’m optimizing a black-box objective function where the task is to find the optimal turbine locations in a wind park. Previously, I used a PPO reinforcement learning approach with a step-by-step ...
1vote
1answer
156views
How Do I Optimise a Black-Box Objective Function with DQN Using Reinforcement Learning?
I'm a beginner in the field of reinforcement learning, and I'm currently working on a problem that has me a bit stuck. I'm trying to optimize a black-box objective function using reinforcement ...
0votes
1answer
289views
Why is PPO not choosing a solution that is giving a higher cumulative reward?
I use PPO to train my fermenter (digital twin) to maximize enzyme (product) production. action: 1 or 0 ie. add substrate at a particular time or not based on cell and enzymes present in the tank ...
1vote
0answers
177views
Python libraries for mulit-armed bandit problems [closed]
I am working on a problem that can be casted as a contextual bandit problem with continuous action space. I would like to tackle it by using something like the contextual zooming algorithm from the ...
2votes
1answer
67views
How does reward work while training a Reinforcement Learning agent?
I am using PPO to train my environment which I created using stable baselines 3. I am confused if I should make the reward = 0 in the step function or not. Initially, I used to have self.reward = 0 in ...
1vote
1answer
158views
Why are these two implementations of the $\epsilon$-greedy policy different?
According to the book Reinforcement Learning An Introduction, the epsilon greedy policy can generally implemented as: $$ \pi(a|s) = \begin{cases} \frac{\epsilon}{|A|} + 1 - \epsilon & \text{if } ...
1vote
1answer
121views
RL agent for autonomous vehicle is able to follow the road but can't avoid crashing at all (Highway-Env / Racetrack Env.)
I coded some deep RL algorithms (DQN and SAC) with tf2/keras to solve an environment where a vehicle needs to follow the track and avoid crashing into one other vehicle (there is only one other ...
1vote
1answer
478views
Getting always the same action on an A2C from stable_baselines3
I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that ...
1vote
1answer
603views
What is the problem in my implementation of actor critic?
I have been implementing both REINFORCE with baseline and actor-critic to solve "cartpole-v1". As a reminder, here is the presentation of the algorithms in Sutton and Barto's book (http://...
1vote
1answer
443views
OpeanAI Gym. Train problem: invalid values [closed]
I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage. To keep it as simple as possible, the efficiency of charge, storage and discharge are 100%. ...
3votes
0answers
152views
Are there Reinforcement Learning algorithms specialized for the case $\gamma=0$?
I have a Reinforcement Learning problem where the optimal policy does not depend on the next state (ie gamma equals 0). I think this means that I only need an efficient exploration algorithm coupled ...
0votes
1answer
99views
What would the "state space" and its Python implementation be for my simulation?
Context I'm trying to build a social-consensus simulation involving two intelligent agents. The simulation involves a graph/network of nodes. Nearly all of these nodes (> 90%) will be green agents. ...